Coexistence of native and denatured phases in a single protein-like molecule 
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In order to understand the nuclei which develop during the course of protein folding and unfolding, we examine 
phase segregation of a single heteropolymer chain which occurs in equilibrium. These segregated conformations 
are characterized by a nucleus of monomers which are superimposable upon the native conformation. We com- 
putationally generate the phase segregation by applying a "folding pressure," or adding an energetic bonus for 
native monomer-monomer contacts. The computer models reveal a fundamental difference in the nucleation pro- 
cess between heteropolymeric and the more familiar vapor-liquid systems: in a polymer system, some nuclei hinder 
folding via topological constraints and must be partially destroyed in order for folding to proceed. To illustrate 
this finding, we examine the kinetics of protein unfolding in the long chain limit through scaling arguments. We 
find that because of the topological constraints, the critical nucleus size is of the order of the entire chain size so 
that unfolding time scales as exp [ciV 2 / 3 ] , where N and c are the chain length and a constant. 



Proteins fold and unfold cooperatively through the 
transition accompanied by a large latent heat and other 
signs of discontinuity As long as a single protein 
molecule can be described in statistical terms Q , folding 
and unfolding should be identified as first order phase 
transitions. According to sophisticated equilibrium sta- 
tistical mechanics models H, this sharpness, or coop- 
erativity, of the transition is a direct manifestation of 
protein nonrandomness. Indeed, proteins are heteropoly- 
mers, but with nonrandom sequences that have been se- 
lected, presumably in the course of evolution. It is now 
an outstanding challenge to understand why and how this 
selection causes proteins to meet the kinetic requirements 
of rapid (milliseconds to seconds) and reliable folding. 

Since the first order phase transition nature of fold- 
ing, with its connection to the evolutionary selection of 
sequences, was realized, much attention has been paid 
to nucleation as a natural scenario of folding Q-^] The 
straightforward implementation of the nucleation idea, 
however appealing, faces difficulties, as evidenced by the 
recent heated debates @ ||]. In hindsight, these difficul- 
ties are hardly surprising as chain connectivity must sig- 
nificantly modify the very concept of nucleation in ways 
which we do not yet have the proper insight for. 

Our goal in the present paper is to gain such an insight 
by reexamining the foundations of the nucleation concept 
for proteins and computationally generating conforma- 
tions containing candidates for nuclei for toy proteinlike 
models. We will systematically employ the analogy with 
other first order phase transitions, such as the liquid- 
gas transition, where the kinetic concept of nucleation is 
ultimately related to the phase segregated states in equi- 
librium. Indeed, the nucleus is nothing more than a piece 
of a "new" equilibrium phase which tentatively coexists 
with the surrounding sea of the "old" phase. Of course, 
the nucleus is not really in equilibrium since it grows. 
However, and we view that as the single major lesson to 
be learned from a liquid-gas type system [[lOj , the nucleus 
is, in the proper sense, close to equilibrium. Indeed, a nu- 



cleus grows slowly - in the sense that all other degrees of 
freedom have sufficient time to relax while the nucleus 
size does not change appreciably. The nucleus size can 
thus be said to be a "good reaction coordinate." In terms 
of landscape theory [jllj , the relevant profile is then that 
of the free energy taken as a function of the nucleus size. 
The important part of that picture is, again, that the 
nucleus is fairly close to being in equilibrium. 

Thus, we have to address equilibrium phase segre- 
gation and phase coexistence in a proteinlike molecule. 
Note that we are referring to the equilibrium coexistence 
of two distinct phases within a single protein chain which 
should not be confused with the coexistence of folded 
and unfolded molecules in solution. In order to address 
the phase segregated (macro)states, we resort again to 
the analogy with a liquid-gas system for which there are 
two ways to bring the system into the phase segregated 
state: one is to bring it exactly to the transition tem- 
perature, and then add (or remove) some heat; the other 
involves a range of temperatures and the control of pres- 
sure and volume. The former approach is not usable in 
Monte Carlo simulations since they are done at constant 
temperatures and heat transfer cannot be controlled in 
canonical ensembles. As for the latter approach, the ana- 
logues of pressure and volume have not been defined for 
proteinlike systems. This is precisely what we will do. 

Recall that the equilibrium folding theory for nonran- 
dom (designed ||) heteropolymers recognizes the native 
overlap Q as an order parameter (for any given confor- 
mation, Q is the number of native monomer- monomer 
bonds). This quantity is thermodynamically additive and 
is thus similar to volume in a liquid-gas system. It is then 
straightforward to define the analogue of pressure, which 
we call folding pressure Pq. It is the quantity conjugate 
to Q: given the Hamiltonian Hq of our protein, apply- 
ing folding pressure Pq means taking the Hamiltonian 
H = Hq — PqQ. In other words, folding pressure is an 
additional energy bonus for every correct (native) bond. 
The —PqQ term can also be thought of as a perturbation 



1 



February 1, 2008 



2 



of the original Hamiltonian Hq by Go interactions [jl2| . 

Although it may not be easy to directly realize pure 
folding pressure in real experiments, changing experi- 
mentally controlled environmental parameters, such as 
pH or denaturant concentration, affects the folding pres- 
sure as well. In this sense, examining folding pressure 
in the theoretical model is just as relevant as studying 
temperature. Besides, and even more importantly, the 
very simple idea of folding pressure allows us to exercise 
physical intuition in a new fruitful way. 

To perform Monte Carlo simulations of the folding 
transition, the polymer is modeled as a self-avoiding 
chain of 27 or 48 monomers on a cubic lattice. The 
Hamiltonian is given by Hq = B SlS , A(rj — rj) 

where /, J label monomers along the chain, B SlSj is the 
interaction between species sj and sj, A(r/ — rj) = 1 if 
/ and J are nearest neighbors and A(rj — r j) = other- 
wise. We employ the model ]l3[ ] in which the energies -By- 
are chosen independently from a Gaussian distribution. 
The sequence of species along the chain was obtained 
through simulated annealing so that the ground state of 
the polymer is the native conformation Hfl- 




Native overlap, Q 

FIG. 1. The distribution of a 27-mer over Q at various 
Pq at T = 4.0. Both Pq and T are measured in the units 
of SB, the variance of the interaction matrix. At each Pq, 
the normalized distribution is given by the gray level: darker 
places correspond to higher probability densities. Triangles 
indicate the peaks of the distributions. The Van der Waals 
isotherm in the inset shows close qualitative similarity. 

The Pq — Q isotherms which are obtained by per- 
forming long Monte Carlo runs at various temperatures 
and folding pressures are strikingly similar to the P — V 
isotherms for a liquid-gas system (see Figure [j]). The 
folding pressure appears as the —PqQ term in addition 
to the usual energy Ho in the standard Metropolis cri- 
teria. For 27-mers, runs are at least 2 x 10 9 (and up 



to 5 x 10 10 ) iterations, ie. until the distribution of con- 
formations over Q has reached equilibrium. As we vary 
the pressure at each temperature, the distribution of Q 
changes from a monomodal distribution to a bimodal dis- 
tribution near the transition pressure and back again to a 
monomodal distribution (Figure [j]). The bimodal distri- 
bution is characterized by two maxima at Q = Qmax and 
Q = Qu and a wide minimum in between. For 27-mers, 
Qmax — 28, Q u varies from 2 to 8, and the minimum 
is centered around Q = 18. Qmax corresponds to the 
folded state and Q's near Q u correspond to the unfolded 
states. The bimodal distribution occurs over a range of 
pressures, thus manifesting the mctastable states. 

The immediate (trivial) insight following from Figure |l| 
is that there should be both regimes of nucleation and of 
spinodal decomposition. Indeed, if the system is initially 
equilibrated, say, in the unfolded phase (lower left cor- 
ner in Figure and folding is then caused by an instant 
folding pressure quench at constant temperature, the ki- 
netics will proceed differently depending on whether the 
system is quenched to the region where the original state 
is metastable or totally unstable. In particular, we have 
observed both nucleation and spinodal decomposition by 
examining the time evolution of the native contacts close 
to or far away from the transition temperature, respec- 
tively . Close to the transition point the polymer re- 
mains folded or unfolded for an extended period of time 
until a particular group of native contacts which form 
the "nucleus" or "critical loop" are formed or broken, 
after which the polymer rapidly folds or unfolds, as is 
characteristic of nucleation. Far away from the transi- 
tion temperature, there is no longer a free energy barrier 
to the (un)folded phase during (un)folding, that is, all 
domains of the (un)folded phase are unstable. This can 
be seen as a gradual increase of the (un)folded phase. 
The spinodal decomposition scenario implies nonspecific 
"homopolymer" collapse as the first stage of folding [jl6| . 

By applying folding pressure, we can bring the system 
to the transition point at any given temperature. Once 
we are at the transition point, the stage of the transi- 
tion can be controlled by varying Q. This is analogous 
to controlling the volume in a liquid-gas system by mov- 
ing the piston when the pressure is equal to the transi- 
tion pressure. This naive "piston" model is particularly 
useful due to the finite size of the system. Indeed, for 
macroscopic systems, the critical nucleus size is always 
much smaller than the system size. Accordingly, an in- 
finitesimally small change in volume or displacement of a 
piston, would be needed to produce an equilibrium phase 
segregated state in which the size of, say, the liquid phase 
would be approximately that of a critical nucleus. Since 
the polymers of interest are of moderate size, the ex- 
pected critical nucleus is not negligibly small compared 
to the entire system. This is why the interesting stages 
of the transition are seen when Q is significantly different 
from the values of any of the phases. 

We computationally generated the phase segregated 
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FIG. 2. Phase segregated macrostates. Empty circles outline the native conformation. Gray level for every position shows 
how frequently this position is occupied by a monomer, black means that a monomer is always there. Folding probabilities, p, 
for microstates belonging to each of these three macrostates have fairly narrow distributions: average values of (p) are shown 
in each figure, while the variance is significantly smaller, about (Ap) ~ 0.05 for all three examples. 



macrostates of a heteropolymer by running simulations 
at the transition point (T = 1.7, Pq = 0.5 for 27-mers 
and T = 2.28, P Q = 1.0 for 48-mers; both T and Pq are 
given in units of SB, the variance of the interaction ma- 
trix |lq] ). To mimic a fixed "piston" position, we restrict 
the value of Q to some chosen level Q. We first obtain 
conformations at Q = Q by doing a Monte Carlo run 
with unconstrained Q and collecting conformations every 
time the system passes through the Q "surface." Each 
of the collected conformations with Q = Q can be used 
to initiate a new restricted run in which Q = Q. The set 
of microstates (individual conformations) encountered in 
the restricted run will form a macrostate at the transi- 
tion temperature T characterized by the given values of 
Pq, and Q — Q. The phase segregation is clearly seen 
in Figure ^ where orientation and position are chosen 
for each microstate with maximum superposition with 
the native state. When the microstates are subsequently 
superimposed upon one another, we see that there is 
a set of "core" monomers, mostly superimposed upon 
the native state, which do not fluctuate in position (Fig- 
ure |). These monomers can be interpreted as the "na- 
tive" phase, while the monomers which fluctuate belong 
to the "denatured" phase. 

Based on the analogy with the liquid-gas system, we 
expect that phase segregated macrostates obtained at 
subsequent values of Q ("piston positions") would be 
very similar to those encountered subsequently in time 
during a real kinetic event. To test this expectation, we 
employ the general method suggested in the work flTf ] 
and measured the folding probability p for each of the mi- 
crostates belonging to the phase segregated macrostate. 
We found that the distribution of p's is indeed very nar- 
row for most of the macrostates. In the three exam- 
ples shown in Figure ||, the variance of p is as low as 
(Ap) ~ 0.05. This is to be compared with the very broad 
distributions of p's for the ensemble of all states with the 
given Q = Q, where (Ap) can be of order unity ]l7| . This 
means that by "moving the piston" in the way described 
above, we indeed drag the system along its natural ki- 



netic path. However, as we mentioned, this is only valid 
for a majority of phase segregated macrostates. There are 
important exclusions from the rule which represent the 
fundamental difference between (hetero) polymeric sys- 
tem and the more familiar liquid-gas one. 




FIG. 3. Example of topologically hindered folding. The 
left figure is the native conformation. The right figure repre- 
sents a phase segregated states obtained by gradualy increas- 
ing the proportion of the folded part. While 41 out of 47 
polymeric bonds (87%) are successfully superimposed on the 
native conformation, the denatured phase developed a topo- 
logical constraint. Folding can proceed only if the nucleus 
(folded phase) is partially destroyed. 

To understand the problem, consider Figure It 
demonstrates one of the examples in which "pushing the 
piston" brings the system to a dead end instead of drag- 
ging it along its path from one phase to the other. The 
reason for the deadlock is purely topological: while the 
amount of the folded phase is growing at the expense of 
the denatured phase, a topological constraint has been 
formed in the latter. As a result, the system arrives at a 
conformation which cannot fold without first destroying 
a significant part of its correctly folded "native phase." 

The insight we gained from considering phase segre- 
gated macrostates and the role of topological constraints 
can be now used to produce a scaling analysis for very 
long (N — > oo) chains, thus approaching the problem for- 
mulated in the work jl8|]. Here, we restrict ourselves to 
the scaling of unfolding time, which is more instructive 
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in terms of the role of topological constraints. We begin 
with a correctly folded native globule, and then quench 
the temperature and/or folding pressure such that na- 
tive state becomes (globally) unstable. The nucleus of 
the unfolded denatured phase, which sooner or later will 
appear in this system can be imagined as a "bubble" of 
polymer melt (or solution; Figure |]). This polymer liq- 
uid consists of one or several loops. It is important to 
note that on the time scale of interest, while the nucleus 
remains unchanged in size, the ends of all those loops are 
firmly quenched at the corresponding "root points" on 
the surface of native phase which is frozen. 

Since all ends are fixed, the topology of the loops is 
well-defined and quenched. The crucial point here is 
not the classification of the given state as "entangled" 
or "unentangled" but that the mutual positions of the 
loops are quenched in the same topological class to which 
it belonged in the native state (unless a chain end be- 
longs to the nucleus, which is improbable in the N —> oo 
limit). This fact has profound consequences in terms of 
the critical nucleus size and, accordingly, the unfolding 
barrier height. Indeed, normally, in a vapor-liquid sys- 
tem, nucleation free energy can be schematically written 
as — aATA + aNs, where the volume part (A), which 
is proportional to the deviation from the transition point 
AT (degree of overheating or overcooling in the initial 
quench) is negative (favorable) and the surface part (ATs) 
is positive (unfavorable). For a polymer, melting of the 
nucleus does not release all of the volume free energy 
— cvATA because the melted part remains topologically 
constrained. Thus, there appears an additional positive 
contribution to the nucleus free energy. As long as the nu- 
cleus remains small compared to the entire globule, this 
new term can be estimated as +T In M. , where M. is the 
number of topologically different classes. Since M. grows 
exponentially with the number of monomers involved, we 
end up with an extra volume term which is always pos- 
itive and independent of AT: -aATN + (3TN + <rNi. 
To estimate (3, we performed an exhaustive enumeration 
of all two-loops conformations with fixed ends within a 
3x3x3 cube (N = 27, see Figure §) and found that 
[3 = 0.45 is of order unity and by no means small (con- 
trary to an early estimate |19|). Thus, topological con- 
straints significantly increase the height of the barrier, 
in complete agreement with our simulations (Figure^). 
As long as temperature jumps causing unfolding, AT, 
remain finite and A-independent, the critical nucleus is 
of the order of the entire globule, and thus the unfold- 
ing time scales as exp [c • A 2 / 3 ] , as opposed to some N- 
independent time for the phantom polymer, which is al- 
lowed to freely pass through itself. As for c, it is a con- 
stant, and we see no grounds to assume that it is signif- 
icantly different from unity. Our result agrees very well 
with the original estimate of folding time under equilib- 
rium conditions given by Finkelstein, but sharply contra- 
dicts to its latest improvement []L9|. 



molten nucleus 




FIG. 4. Schematic representation of unfolding nucleus in- 
side a large folded globule and its lattice model. 

To conclude, we found that study of the phase segrega- 
tion occuring in an equilibrium proteinlike hctcropolymcr 
sheds light on the possible and impossible nuclei config- 
urations relevant for folding and unfolding kinetics. We 
found in particular that topological constraints play an 
important role in determining the critical nucleus. In the 
case of unfolding, topological constraints dramaticaly in- 
crease the size of the critical nucleus, causing the unfold- 
ing time to scale exponentially with the chain length. 
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